Video-Engine-Starter

Programmatic video generation using React, Remotion, and Temporal Authority.

Track 03 · Media. A developer starter kit for rendering 9:16 vertical MP4s from text briefs. The core engine runs script agents to draft scenes, calls edge TTS to generate speech audio, measures files via ffprobe, and injects exact frame durations into Remotion templates. Extracted from production Agentic OS.

Open source github.com ↗
Track
Track 03 · Programmatic Media Pipelines
Runtime
Node.js 20+ FFmpeg / ffprobe
Tech Stack
React TypeScript Remotion Edge TTS
Cost
Zero cost ($0) on free endpoints
Repository

Video Engine Pipeline : Overview

Run script agent, download speech tracks, extract length via ffprobe, and compile frames in Remotion.

Programmatic Media Pipeline : Architecture Diagram

Detailed DAG workflow showing parallel execution of visual generation and caption alignment.

The problem

Building programmatic video by async-waiting inside React components is a recipe for broken layouts. React is built for rendering UI state, not coordinating multi-second media buffers. If you attempt to fetch speech tracks or calculate text animations on the fly inside Remotion compositions, you trigger timing errors. When the video renders, overlays drift out of sync with the audio track.

Programmatic video requires a **Temporal Authority**. Programmatic timing decisions must be made before React starts compiling frames. The Node.js orchestrator must generate the assets, query their durations in milliseconds using CLI utilities, convert those durations into integers based on your target framerate, and feed those integers into React props. React acts as a flat renderer, reading numbers without doing timing math.

How it works: step by step

  • Step 1: Brief-to-Script. The user submits a text brief. A script agent (classifying tasks via the fallback router) writes a structured JSON document: a hook sentence, five narrative segments, visual hints, and a Call-To-Action (CTA).
  • Step 2: Speech Generation. The segment texts are parsed and sent to Edge TTS. Edge TTS generates clean, human-like voice recordings as MP3 files under 60 seconds at zero cost.
  • Step 3: Temporal Frame Math. The orchestrator queries the voice MP3 files using ffprobe to read the exact audio duration down to the millisecond. It multiplies the seconds by the target framerate (e.g. 30 fps) and rounds up to get an integer frame duration.
  • Step 4: Image Collection. Visual cues are extracted from the script, and corresponding scenes are fetched from Pollinations.ai or stable-diffusion endpoints to compile the visual layer.
  • Step 5: Remotion Build. The frame durations are passed as composition props to Remotion. Remotion mounts the components, synchronizes the voice track, overlays text captions, and renders a 1080x1920 MP4 file.

Interactive: Temporal Clock Calculator

Simulate the Node.js ffprobe step calculating the exact Remotion durationInFrames prop from a generated audio file.

Simulated ffprobe Output

Remotion Injection Props

--
Waiting for input...

Programmatic Media Tooling

The pipeline targets free developer keys and APIs to allow rendering without runtime costs:

Step API / Engine Price
LLM Scripting NVIDIA NIM Developer Console → Google Gemini Free API $0.00 (Free tiers)
Text-to-Speech Edge TTS (Reverse engineered Microsoft speech endpoint) $0.00 (No key required)
Image Generation Pollinations.ai (Flux & SDXL wrappers) $0.00 (No key required)
Video Compilation Remotion CLI + FFmpeg + ffprobe package $0.00 (Local compiler)

File Architecture

  • pipeline/run.mjs: The coordinator. Calls script, voice, and visual steps, writes output data files, and fires the Remotion compiler.
  • pipeline/llm-router.mjs: RESILIENCY. Fallback router that skips down the line of API keys if endpoints fail or are rate-limited.
  • pipeline/temporal-authority.mjs: Executes the ffprobe child process to read audio length and convert to frame integers.
  • src/Root.tsx: Declares composition structures and coordinates frame props inside Remotion.

How to run it

git clone https://github.com/shubham0086/video-engine-starter
cd video-engine-starter
npm install
cp .env.example .env

# Compile a video on Deep Sleep
node pipeline/run.mjs "The science of deep sleep"

# Open Remotion Studio to preview
npx remotion studio

# Compile video file to MP4
npx remotion render BasicReel out/deep-sleep.mp4

Where this fits

Video-Engine-Starter is the **programmatic media output** layer. It acts as Stage 4 of the Agentic OS video automation pipeline:
Brief Intake → Research → Outlines → [Video Engine Starter] → Publishing QA → Distribution

Honest framing

This is a developer starter kit, not a plug-and-play SaaS system. It renders basic slides, text layouts, captions, and static images. If you require advanced features like keyframe animations, audio filters, custom transitions, or multi-track audio layering, you will need to write custom React Remotion templates.プログラムによる動画構築のための基盤です。